Measuring and analyzing multi-dimensional sensory information for indentification purposes

ABSTRACT

Methods and systems are provides for measuring multi-dimensional sensing information for identification purposes. The identity of one or more substances is determined through analysis of multidimensional data that can include, among others, intrinsic information as well as extrinsic information. The method for identification of a substance comprises utilizing pattern recognition to form descriptors to identify characteristics of the substance. A system and computer program for performing analysis of the multidimensional data are also described.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority to U.S. Provisional PatentApplication Nos. 60/188,569, 60/188,588, and 60/188,589, all of whichwere filed on Mar. 10, 2000 the teachings of each application are herebyincorporated by reference for all purposes.

BACKGROUND OF THE INVENTION

This invention generally relates to techniques for identifying one ormore substances using multidimensional data. More particularly, thepresent invention provides systems, methods, and computer code forclassifying or identifying one or more substances usingmulti-dimensional data. The multidimensional data can include, amongothers, intrinsic information such as temperature, acidity, chemicalcomposition, and color, as well as extrinsic information, such asorigin, and age. Merely by way of example, the present invention isimplemented using fluid substances, but it would be recognized that theinvention has a much broader range of applicability. The invention canbe applied to other settings such as chemicals, electronics, biological,medical, petrochemical, gaming, hotel, commerce, machining, electricalgrids, and the like.

Techniques and devices for detecting a wide variety of analytes influids such as vapors, gases and liquids are well known. Such devicesgenerally comprise an array of sensors that in the presence of ananalyte produce a unique output signature. Using pattern recognitionalgorithms, the output signature, such as an electrical response, can becorrelated and compared to the known output signature of a particularanalyte or mixture of substances. By comparing the unknown signaturewith the stored or known signatures, the analyte can be detected,identified, and quantified. Examples of such detection devices can befound in U.S. Pat. No. 5,571,401(Lewis et al.); U.S. Pat. No. 5,675,070(Gelperin); U.S. Pat. No. 5,697,326 (Mottram et al.); U.S. Pat. No.5,788,833 (Lewis et al.); U.S. Pat. No. 5,807,701 (Payne et al.); andU.S. Pat. No. 5,891,398 (Lewis et al.), the disclosures of which areincorporated herein by reference.

Generally all of these techniques rely upon a predetermined patternrecognition algorithm to analyze data to compare a known signature withan unknown signature to detect and identify an unknown analyte. Thesetechniques, however, are often cumbersome. They also require highlymanual data processing techniques. Additionally, each algorithm mustoften require manual input to be used with the known signature.Furthermore, there are many different types of algorithms, which mustoften be used. These different algorithms are often incompatible witheach other and cannot be used in a seamless and cost effective manner.These and many other limitations are described throughout the presentspecification and more particularly below.

From the above, it is seen that an improved way to identify acharacteristic of a fluid substance is highly desirable.

SUMMARY OF THE INVENTION

According to the present invention, a technique including systems,methods, and computer codes for identifying one or more substances usingmultidimensional data is provided. More particularly, the presentinvention provides systems, methods, and computer codes for classifyingor identifying one or more substances using multi-dimensional data. Themultidimensional data can include, among others, intrinsic informationsuch as temperature, acidity, chemical composition, olfactoryinformation, color, sugar content, as well as extrinsic information,such as origin, and age.

In one specific embodiment, the present invention provides a systemincluding computer code for training computing devices forclassification or identification purposes for one or more substancescapable of producing olfactory information. The computer code isembedded in memory, which can be at a single location or multiplelocations in a distributed manner. The system has a first code directedto acquiring at least first data from a first substance and second datafrom a second substance to a computing device. The data are comprised ofa plurality of characteristics to identify the substance. The systemalso includes a second code directed to normalizing at least one of thecharacteristics for each of the first data and the second data. Next,the system includes computer code directed to correcting at least one ofthe characteristics for each of the first data and the second data. Acode directed to processing one or more of the plurality ofcharacteristics for each of the first data and the second data in thecomputing device using pattern recognition to form descriptors toidentify the first substance or the second substance also is included.For purposes of this application, the term “descriptors” includes modelcoefficients/parameters, loadings, weightings, and labels, in additionto other types of information. A code directed to storing the set ofdescriptors into a memory device coupled to the computing device. Theset of descriptions is for analysis purposes of one or a plurality ofsubstances. This code and others can be used with the present inventionto perform the functionality described herein as well as others.

In a further embodiment, the invention provides a computer programproduct or code in memory for preprocessing information foridentification or classification purposes. Here, the code is stored inmemory at a single location or distributed. The product includes a codedirected to acquiring a voltage reading from a sensor of a sensingdevice. The sensor is one of a plurality of sensors that are disposed inan array. The code is also provided for determining if the voltage isoutside a baseline voltage of a predetermined range. If the voltage isoutside the predetermined range, the code is directed to reject thesensor of the sensing device for use in acquiring sensory information.In some embodiments, the present invention further comprises a codedirected to exposing at least one of the sensors to a sample andacquiring a sample voltage from the sample, if the sample voltage isoutside a predetermined sample voltage range, reject the one exposedsensor. This code and others can be used with the present invention toperform the functionality described herein as well as others.

In yet another embodiment, the present invention provides a system forclassifying or identifying one or more substances capable of producingolfactory information. The system includes a process manager and aninput module coupled to the process manager. The input module providesat least a first data from a first substance and second data from asecond substance to a computing device. The data are comprised of aplurality of characteristics to identify the substance. The system alsoincludes a normalizing module coupled to the process manager fornormalizing at least one of the characteristics for each of the firstdata and the second data. A pattern recognition module is coupled to theprocess manager for processing one or more of the plurality ofcharacteristics for each of the first data and the second data in thecomputing device using pattern recognition to form descriptors toidentify the first substance or the second substance. An output moduleis coupled to the main process manager for storing the set ofdescriptors into a memory device coupled to the computing device. Theset of descriptions is for analysis purposes of one or a plurality ofsubstances. Depending upon the embodiment, other modules can also exist.

In still another specific embodiment, the present invention provides amethod for training computing devices for classification oridentification purposes for one or more substances capable of producingolfactory information. The method includes providing at least a firstdata from a first substance and second data from a second substance to acomputing device. The data are comprised of a plurality ofcharacteristics to identify the substance. The method also includesnormalizing at least one of the characteristics for each of the firstdata and the second data. Next, the method includes correcting at leastone of the characteristics for each of the first data and the seconddata. A step of processing one or more of the plurality ofcharacteristics for each of the first data and the second data in thecomputing device using pattern recognition to form descriptors toidentify the first substance or the second substance also is included.The method then stores the set of descriptors into a memory devicecoupled to the computing device. The set of descriptions is for analysispurposes of one or a plurality of substances.

In another alternative embodiment, the present invention provides amethod for teaching a system used for analyzing multidimensionalinformation for one or more substances, e.g., liquid, vapor, fluid. Themethod also includes providing a plurality of different substances. Eachof the different substances is defined by a plurality of characteristicsto identify any one of the substances from the other substances, theplurality of characteristics being provided in electronic form. Themethod also includes providing a plurality of processing methods. Eachof the processing methods is capable of processing each of the pluralityof characteristics to provide an electronic fingerprint for each of thesubstances. A step of processing each of the plurality ofcharacteristics for each of the substances through a first processingmethod from the plurality of processing methods to determinerelationships between each of the substances through the plurality ofcharacteristics of each of the substances from the first processingmethod is also included. The method further includes processing each ofthe plurality of characteristics for each of the substances through asecond processing method to determine relationships between each of thesubstances through the plurality of characteristics for each of thesubstances from the second processing method. The method includesprocessing each of the plurality of characteristics for each of thesubstances through an nth processing method to determine relationshipsbetween each of the substances through the plurality of characteristicsfrom each of the substances from the nth processing method. The methodcompares the relationships from the first processing method to therelationships from the second processing method to the relationshipsfrom the nth processing method to find the processing method that yieldsthe largest signal to noise ratio to identify each of the substances;and selects the processing method that yielded the largest signal tonoise ratio. The relationships from the selected processing methodprovide an improved ability to distinguish between each of thesubstances using the selected processing method.

In still a further embodiment, the invention provides a method forpreprocessing information for identification or classification purposes.The method includes acquiring a voltage reading from a sensor of asensing device. The sensor is one of a plurality of sensors that aredisposed in an array. The method also includes determining if thevoltage is outside a baseline voltage of a predetermined range. If thevoltage is outside of the predetermined range, the method rejects thesensor of the sensing device for use in acquiring sensory information.In some embodiments, the present invention further comprises exposing atleast one of the sensors to a sample and acquiring a sample voltage fromthe sample, if the sample voltage is outside a predetermined samplevoltage range, the method rejects the one exposed sensor.

In yet another embodiment, the present invention provides a system foridentifying a substance capable of producing olfactory information. Thesystem includes a user interface apparatus comprising a display, agraphical user interface, and a central processor. The system furtherincludes a process manager operably coupled to the display through thecentral processor. The graphical user interface is capable of imputingan information object from a client to manipulate olfaction data anddisplaying the identity of a test substance received from a server.

Numerous benefits are achieved by way of the present invention overconventional techniques. For example, the present invention provides aneasy to use method for training a process using more than one processingtechnique. Further, the invention can be used with a wide variety ofsubstances, e.g., chemicals, fluids, biological materials, foodproducts, plastic products, household goods. Additionally, the presentinvention can remove a need for human intervention in deciding whichvariables that describe a system or process are important or notimportant. Depending upon the embodiment, one or more of these benefitsmay be achieved. These and other benefits will be described in morethroughout the present specification and more particularly below.

Various additional objects, features and advantages of the presentinvention can be more fully appreciated with reference to the detaileddescription and accompanying drawings that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified diagram of an environmental information analysissystem according to an embodiment of the present invention;

FIGS. 2 to 2A are simplified diagrams of computing device for processinginformation according to an embodiment of the present invention;

FIG. 3 is a simplified diagram of computing modules for processinginformation according to an embodiment of the present invention;

FIG. 3A is a simplified diagram of a capturing device for processinginformation according to an embodiment of the present invention;

FIGS. 4A to 4E are simplified diagrams of methods according toembodiments of the present invention; and

FIGS. 5A to 5L are simplified diagrams of an illustration of an exampleaccording to the present invention.

DETAILED DESCRIPTION OF THE INVENTION AND PREFERRED EMBODIMENTS

FIG. 1 is a simplified diagram of an environmental information analysissystem 100 according to an embodiment of the present invention. Thisdiagram is merely an example, which should not limit the scope of theclaims herein. One of ordinary skill in the art would recognize manyother variations, modifications, and alternatives. As shown, the system100 includes a variety of elements such as a wide area network 109 suchas, for example, the Internet, an intranet, or other type of network.Connected to the wide area network 109 is an information server 113,with terminal 102 and database 106. The wide area network allows forcommunication of other computers such as a client unit 112. Client canbe configured with many different hardware components and can be made inmany dimensions, styles and locations (e.g., laptop, palmtop, pen,server, workstation and mainframe).

Terminal 102 is connected to server 113. This connection can be by anetwork such as Ethernet, asynchronous transfer mode, IEEE standard 1553bus, modem connection, universal serial bus, etc. The communication linkneed not be a wire but can be infrared, radio wave transmission, etc.Server 113 is coupled to the Internet 109. The Internet is shownsymbolically as a cloud or a collection of server routers, computers,and other devices 109. The connection to server is typically by arelatively high bandwidth transmission medium such as a T1 or T3 line,but can also be others.

In certain embodiments, Internet server 113 and database 106 storeinformation and disseminate it to consumer computers e.g. over wide areanetwork 109. The concepts of “client” and “server,” as used in thisapplication and the industry, are very loosely defined and, in fact, arenot fixed with respect to machines or software processes executing onthe machines. Typically, a server is a machine e.g. or process that isproviding information to another machine or process, i.e., the “client,”e.g., that requests the information. In this respect, a computer orprocess can be acting as a client at one point in time (because it isrequesting information) and can be acting as a server at another pointin time (because it is providing information). Some computers areconsistently referred to as “servers” because they usually act as arepository for a large amount of information that is often requested.For example, a WEB site is often hosted by a server computer with alarge storage capacity, high-speed processor and Internet link havingthe ability to handle many high-bandwidth communication lines.

In a specific embodiment, the network is also coupled to a plurality ofsensing devices 105. Each of these sensing devices can be coupleddirectly to the network or through a client computer, such as client112. Sensing devices 105 may be connected to a device such as a Fieldbusor CAN that is connected to the Internet. Alternatively, sensing devices105 may be in wireless communication with the Internet.

Each of the sensing devices can be similar or different, depending uponthe application. Each of the sensing devices is preferably an array ofsensing elements for acquiring olfactory information from fluidsubstances, e.g., liquid, vapor, liquid/vapor. Once the information isacquired, each of the sensing devices transfers the information toserver 113 for processing purposes. In the present invention, theprocess is performed for classifying or identifying one or moresubstances using the information that includes multi-dimensional data.Details of the processing hardware are shown below and illustrated bythe FIGS.

FIG. 2 is a simplified diagram of a computing device for processinginformation according to an embodiment of the present invention. Thisdiagram is merely an example, which should not limit the scope of theclaims herein. One of ordinary skill in the art would recognize manyother variations, modifications, and alternatives. Embodiments accordingto the present invention can be implemented in a single applicationprogram such as a browser, or can be implemented as multiple programs ina distributed computing environment, such as a workstation, personalcomputer or a remote terminal in a client server relationship. FIG. 2shows computer system 210 including display device 220, display screen230, cabinet 240, keyboard 250, and mouse 270. Mouse 270 and keyboard250 are representative “user input devices.” Mouse 270 includes buttons280 for selection of buttons on a graphical user interface device. Otherexamples of user input devices are a touch screen, light pen, trackball, data glove, microphone, and so forth. FIG. 2 is representative ofbut one type of system for embodying the present invention. It will bereadily apparent to one of ordinary skill in the art that many systemtypes and configurations are suitable for use in conjunction with thepresent invention. In a preferred embodiment, computer system 210includes a Pentium™ class based computer, running Windows™ NT operatingsystem by Microsoft Corporation. However, the apparatus is easilyadapted to other operating systems and architectures by those ofordinary skill in the art without departing from the scope of thepresent invention.

As noted, mouse 270 can have one or more buttons such as buttons 280.Cabinet 240 houses familiar computer components such as disk drives, aprocessor, storage device, etc. Storage devices include, but are notlimited to, disk drives, magnetic tape, solid state memory, bubblememory, etc. Cabinet 240 can include additional hardware such asinput/output (I/O) interface cards for connecting computer system 210 toexternal devices external storage, other computers or additionalperipherals, which are further described below.

FIG. 2A is an illustration of basic subsystems in computer system 210 ofFIG. 2. This diagram is merely an illustration and should not limit thescope of the claims herein. One of ordinary skill in the art willrecognize other variations, modifications, and alternatives. In certainembodiments, the subsystems are interconnected via a system bus 275.Additional subsystems such as a printer 274, keyboard 278, fixed disk279, monitor 276, which is coupled to display adapter 282, and othersare shown. Peripherals and input/output (I/O) devices, which couple toI/O controller 271, can be connected to the computer system by anynumber of means known in the art, such as serial port 277. For example,serial port 277 can be used to connect the computer system to a modem281, which in turn connects to a wide area network such as. theInternet, a mouse input device, or a scanner. The interconnection viasystem bus allows central processor 273 to communicate with eachsubsystem and to control the execution of instructions from systemmemory 272 or the fixed disk 279, as well as the exchange of informationbetween subsystems. Other arrangements of subsystems andinterconnections are readily achievable by those of ordinary skill inthe art. System memory, and the fixed disk are examples of tangiblemedia for storage of computer programs, other types of tangible mediainclude floppy disks, removable hard disks, optical storage media suchas CD-ROMS and bar codes, and semiconductor memories such as flashmemory, read-only-memories (ROM), and battery backed memory.

FIG. 3 is a simplified diagram of computing modules 300 in a system forprocessing information according to an embodiment of the presentinvention This diagram is merely an example which should not limit thescope of the claims herein. One of ordinary skill in the art wouldrecognize many other variations, modifications, and alternatives. Asshown, the computing modules 300 include a variety of processes, whichcouple to a process manager 314. The processes include an upload process301, a filter process 302, a baseline process 305, a normalizationprocess 307, a pattern process 309, and an output process 311. Otherprocesses can also be included. Process manager also couples to datastorage device 333 and oversees the processes. These processes can beimplemented in software, hardware, firmware, or any combination of thesein any one of the hardware devices, which were described above, as wellas others.

The upload process takes data from the acquisition device and uploadsthem into the main process manager 314 for processing. Here, the dataare in electronic form. In embodiments where the data has been stored indata storage, they are retrieved and then loaded into the process.Preferably, the data can be loaded onto workspace to a text file orloaded into a spreadsheet for analysis. Next, the filter process 302filters the data to remove any imperfections. As merely an example, datafrom the present data acquisition device are often accompanied withglitches, high frequency noise, and. the like. Here, the signal to noiseratio is often an important consideration for pattern recognitionespecially when concentrations of analytes are low, exceedingly high, ornot within a predefined range of windows according to some embodiments.In such cases, it is desirable to boost the signal to noise ratio usingthe present digital filtering technology. Examples of such filteringtechnology includes, but is not limited to a Zero Phase Filter, anAdaptive Exponential Moving Average Filter, and a Savitzky-Golay Filter,which will be described in more detail below.

The data go through a baseline correction process 305. Depending uponthe embodiment, there can be many different ways to implement a baselinecorrection process. Here, the baseline correction process finds responsepeaks, calculates ΔR/R, and plots the ΔR/R verses time stamps, where thedata have been captured. It also calculates maximum ΔR/R and maximumslope of ΔR/R for further processing. Baseline drift is often correctedby way of the present process. The main process manager also overseesthat data traverse through the normalization process 307. In someembodiments, normalization is a row wise operation. Here, the processuses a so-called area normalization. After such normalization method,the sum of data along each row is unity. Vector length normalization isalso used, where the sum of data squared of each row equals unity.

Next, the method performs a main process for classifying each of thesubstances according to each of their characteristics in a patternrecognition process. The pattern recognition process uses more than onealgorithm, which are known, are presently being developed, or will bedeveloped in the future. The process is used to find weighting factorsfor each of the characteristics to ultimately determine an identifiablepattern to uniquely identify each of the substances. That is,descriptors are provided for each of the substances. Examples of somealgorithms are described throughout the present specification. Alsoshown is the output module 311. The output module is coupled to theprocess manager. The output module provides for the output of data fromany one of the above processes as well as others. The output module canbe coupled to one of a plurality of output devices. These devicesinclude, among others, a printer, a display, and a network interfacecard. The present system can also include other modules. Depending uponthe embodiment, these and other modules can be used to implement themethods according to the present invention.

The above processes are merely illustrative. The processes can beperformed using computer software or hardware or a combination ofhardware and software. Any of the above processes can also be separatedor be combined, depending upon the embodiment. In some cases, theprocesses can also be changed in order without limiting the scope of theinvention claimed herein. One of ordinary skill in the art wouldrecognize many other variations, modifications, and alternatives.

FIG. 3A is a simplified diagram of a top-view 350 of aninformation-capturing device according to an embodiment of the presentinvention. This diagram is merely an example, which should not limit thescope of the claims herein. One of ordinary skill in the art wouldrecognize many other variations, modifications, and alternatives. Asshown, the top view diagram includes an array of sensors,351A, 351B,351C, 359 nth. The array is arranged in rows 351, 352, 355, 357, 359 andcolumns, which are normal to each other. Each of the sensors has anexposed surface for capturing, for example, olfactory information fromfluids, e.g., liquid and/or vapor. The diagram shown is merely anexample. Details of such information-capturing device are provided inU.S. application Ser. No. 09/548,948 and U.S. Pat. No. 6,085,576,commonly assigned, and hereby incorporated by reference for allpurposes. Other devices are commercially available from Osmetech,Hewlett Packard, Alpha-MOS, or other companies.

Although the above has been described in terms of a capturing device forfluids including liquids and/or vapors, there are many other types ofcapturing devices. For example, other types of information capturingdevices for converting an intrinsic or extrinsic characteristic to ameasurable parameter can be used. These information capturing devicesinclude, among others, pH monitors, temperature measurement devices,humidity devices, pressure sensors, flow measurement devices, chemicaldetectors, velocity measurement devices, weighting scales, lengthmeasurement devices, color identification, and other devices. Thesedevices can provide an electrical output that corresponds to measurableparameters such as pH, temperature, humidity, pressure, flow, chemicaltypes, velocity, weight, height, length, and size.

In some aspects, the present invention can be used with at least twosensor arrays. The first array of sensors comprises at least two sensors(e.g., three, four, hundreds, thousands, millions or even billions)capable of producing a first response in the presence of a chemicalstimulus. Suitable chemical stimuli capable of detection include, butare not limited to, a vapor, a gas, a liquid, a solid, an odor ormixtures thereof. This aspect of the device comprises an electronicnose. Suitable sensors comprising the first array of sensors include,but are not limited to conducting/nonconducting regions sensor, a SAWsensor, a quartz microbalance sensor, a conductive composite sensor, achemiresistor, a metal oxide gas sensor, an organic gas sensor, aMOSFET, a piezoelectric device, an infrared sensor, a sintered metaloxide sensor, a Pd-gate MOSFET, a metal FET structure, a electrochemicalcell, a conducting polymer sensor, a catalytic gas sensor, an organicsemiconducting gas sensor, a solid electrolyte gas sensor, and apiezoelectric quartz crystal sensor. It will be apparent to those ofskill in the art that the electronic nose array can be comprises ofcombinations of the foregoing sensors. A second sensor can be a singlesensor or an array of sensors capable of producing a second response inthe presence of physical stimuli. The physical detection sensors detectphysical stimuli. Suitable physical stimuli include, but are not limitedto, thermal stimuli, radiation stimuli, mechanical stimuli, pressure,visual, magnetic stimuli, and electrical stimuli.

Thermal sensors can detect stimuli which include, but are not limitedto, temperature, heat, heat flow, entropy, heat capacity, etc. Radiationsensors can detect stimuli that include, but are not limited to, gammarays, X-rays, ultra-violet rays, visible, infrared, microwaves and radiowaves. Mechanical sensors can detect stimuli which include, but are notlimited to, displacement, velocity, acceleration, force, torque,pressure, mass, flow, acoustic wavelength, and amplitude. Magneticsensors can detect stimuli that include, but are not limited to,magnetic field, flux, magnetic moment, magnetization, and magneticpermeability. Electrical sensors can detect stimuli which include, butare not limited to, charge, current, voltage, resistance, conductance,capacitance, inductance, dielectric permittivity, polarization andfrequency.

In certain embodiments, thermal sensors are suitable for use in thepresent invention that include, but are not limited to, thermocouples,such as a semiconducting thermocouples, noise thermometry,thermoswitches, thermistors, metal thermoresistors, semiconductingthermoresistors, thermodiodes, thermotransistors, calorimeters,thermometers, indicators, and fiber optics.

In other embodiments, various radiation sensors are suitable for use inthe present invention that include, but are not limited to, nuclearradiation microsensors, such as scintillation counters and solid statedetectors, ultra-violet, visible and near infrared radiationmicrosensors, such as photoconductive cells, photodiodes,phototransistors, infrared radiation microsensors, such asphotoconductive IR sensors and pyroelectric sensors.

In certain other embodiments, various mechanical sensors are suitablefor use in the present invention and include, but are not limited to,displacement microsensors, capacitive and inductive displacementsensors, optical displacement sensors, ultrasonic displacement sensors,pyroelectric, velocity and flow microsensors, transistor flowmicrosensors, acceleration microsensors, piezoresistivemicroaccelerometers, force, pressure and strain microsensors, andpiezoelectric crystal sensors.

In certain other embodiments, various chemical or biochemical sensorsare suitable for use in the present invention and include, but are notlimited to, metal oxide gas sensors, such as tin oxide gas sensors,organic gas sensors, chemocapacitors, chemodiodes, such as inorganicSchottky device, metal oxide field effect transistor (MOSFET),piezoelectric devices, ion selective FET for pH sensors, polymerichumidity sensors, electrochemical cell sensors, pellistors gas sensors,piezoelectric or surface acoustical wave sensors, infrared sensors,surface plasmon sensors, and fiber optical sensors.

Various other sensors suitable for use in the present invention include,but are not limited to, sintered metal oxide sensors, phthalocyaninesensors, membranes, Pd-gate MOSFET, electrochemical cells, conductingpolymer sensors, lipid coating sensors and metal FET structures. Incertain preferred embodiments, the sensors include, but are not limitedto, metal oxide sensors such as a Tuguchi gas sensors, catalytic gassensors, organic semiconducting gas sensors, solid electrolyte gassensors, piezoelectric quartz crystal sensors, fiber optic probes, amicro-electro-mechanical system device, a micro-opto-electro-mechanicalsystem device and Langmuir-Blodgett films.

Additionally, the above description in terms of specific hardware ismerely for illustration. It would be recognized that the functionalityof the hardware be combined or even separated with hardware elementsand/or software. The functionality can also be made in the form ofsoftware, which can be predominantly software or a combination ofhardware and software. One of ordinary skill in the art would recognizemany variations, alternatives, and modifications. Details of methodsaccording to the present invention are provided below.

A method using digital olfaction information for populating a databasefor identification or classification purposes according to the presentinvention may be briefly outlined as follows:

1. Acquire olfactory data, where the data are for one or moresubstances, each of the substances having a plurality of distinctcharacteristics;

2. Convert olfactory data into electronic form;

3. Provide olfaction data in electronic form (e.g., text, normalizeddata from an array of sensors) for classification or identification;

4. Load the data into a first memory by a computing device;

5. Retrieve the data from the first memory;

6. Remove first noise levels from the data using one or more filters;

7. Correct data to a baseline for one or more variables such as drift,temperature, humidity, etc.;

8. Normalize data using a baseline;

9. Reject one or more of the plurality of distinct characteristics fromthe data;

10. Perform one or more pattern recognition methods on the data;

11. Classify the one or more substances based upon the patternrecognition methods to form multiple classes that each corresponds to adifferent substance;

12. Determine optimized (or best general fit) pattern recognition methodvia cross validation process;

13. Store the classified substances into a second memory for furtheranalysis; and

14. Perform other steps, as desirable.

The above sequence of steps is merely an example of a way to teach ortrain the present method and system. The present example takes more thanone different substance, where each substance has a plurality ofcharacteristics, which are capable of being detected by sensors. Each ofthese characteristics are measured, and then fed into the present methodto create a training set. The method includes a variety of dataprocessing techniques to provide the training set. Depending upon theembodiment, some of the steps may be separated even further or combined.Details of these steps are provided below according to FIGS.

FIGS. 4A to 4B are simplified diagrams of methods according toembodiments of the present invention. These diagrams are merelyexamples, which should not limit the scope of the claims herein. One ofordinary skill in the art would recognize many other variations,modifications, and alternatives. As shown, the present method 400 beginsat start, step 401. The method then captures data (step 403) from a dataacquisition device. The data acquisition device can be any suitabledevice for capturing either intrinsic or extrinsic information from asubstance. As merely an example, the present method uses a dataacquisition device for capturing olfactory information. The device has aplurality of sensors, which convert a scent or olfaction print into anartificial or electronic print. In a specific embodiment, such dataacquisition device is disclosed in WO 99/47905, WO 00/52444 and WO00/79243 all commonly assigned and hereby incorporated by reference forall purposes. Those of skill in the art will know of other devicesincluding other electronic noses suitable for use in the presentinvention. In a specific embodiment, the present invention capturesolfactory information from a plurality of different liquids, e.g.,isopropyl alcohol, water, toluene. The olfactory information from eachof the different liquids is characterized by a plurality of measurablecharacteristics, which are acquired by the acquisition device. Eachdifferent liquid including the plurality of measurable characteristicscan be converted into an electronic data form for use according to thepresent invention. Some of these characteristics were previouslydescribed, but can also include others.

Next, the method transfers the electronic data, now in electronic form,to a computer-aided process (step 405). The computer-aided process maybe automatic and/or semiautomatic depending upon the application. Thecomputer-aided process can store the data into memory, which is coupledto a processor. When the data is ready for use, the data is loaded intothe process, step 407. In embodiments where the data has been stored,they are retrieved and then loaded into the process. Preferably, thedata can be loaded onto workspace to a text file or loaded into aspreadsheet for analysis. Here, the data can be loaded continuously andautomatically, or be loaded manually, or be loaded and monitoredcontinuously to provide real time analysis.

The method filters the data (step 411) to remove any imperfections. Asmerely an example, data from the present data acquisition device areoften accompanied with glitches, high frequency noise, and the like.Here, the signal to noise ratio is often an important consideration forpattern recognition especially when concentrations of analytes are low,exceedingly high, or not within a predefined range of windows accordingto some embodiments. In such cases, it is desirable to boost the signalto noise ratio using the present digital filtering technology. Examplesof such filtering technology includes, but is not limited to, a ZeroPhase Filter, an Adaptive Exponential Moving Average Filter, and aSavitzky-Golay Filter, which will be described in more detail below.

Optionally, the filtered responses can be displayed, step 415. Here, thepresent method performs more than one of the filtering techniques todetermine which one provides better results. By way of the presentmethod, it is possible to view the detail of data preprocessing. Themethod displays outputs (step 415) for each of the sensors, where signalto noise levels can be visually examined. Alternatively, analyticaltechniques can be used to determine which of the filters worked best.Each of the filters are used on the data, step 416 via branch 418. Oncethe desired filter has been selected, the present method goes to thenext step.

The method performs a baseline correction step (step 417). Dependingupon the embodiment, there can be many different ways to implement abaseline correction method. Here, the baseline correction method findsresponse peaks, calculates ΔR/R, and plots the ΔR/R verses time stamps,where the data have been captured. It also calculates maximum ΔR/R andmaximum slope of ΔR/R for further processing. Baseline drift is oftencorrected by way of the present step. Once baseline drift has beencorrected, the present method undergoes a normalization process,although other processes can also be used. Here, ΔR/R can be determinedusing one of a plurality of methods, which are known, if any, ordeveloped according to the present invention. As will be apparent tothose of skill in the art, although in the example resistance is used,the method can use impedance, voltage, capacitance and the like as asensor response.

As merely an example, FIG. 4C illustrates a simplified plot of a signaland various components used in the calculation of ΔR/R, which can beused depending upon the embodiment. This diagram is merely anillustration, which should not limit the scope of the claims herein. Oneof ordinary skill in the art would recognize many other variations,modifications, and alternatives. As shown, the diagram shows a pulse,which is plotted along a time axis, which intersects a voltage, forexample. The diagram includes a ΔR (i.e., delta R), which is definedbetween R and R(max). As merely an example, ΔR/R is defined by thefollowing expression:ΔR/R=(R(max)−R(0))/R

-   -   wherein: ΔR is defined by the average difference between a        baseline value R(0) and R(max); R (max) is defined by a maximum        value of R; R (0) is defined by an initial value of R; and R is        defined as a variable or electrical measurement of resistance        from a sensor, for example.

This expression is merely an example, the term ΔR/R could be defined bya variety of other relationships. Here, ΔR/R has been selected in amanner to provide an improved signal to noise ratio for the signals fromthe sensor, for example. There can be many other relationships thatdefine ΔR/R, which may be a relative relation in another manner.Alternatively, ΔR/R could be an absolute relationship or a combinationof a relative relationship and an absolute relationship. Of course, oneof ordinary skill in the art would provide many other variations,alternatives, and modifications.

As noted, the method includes a normalization step, step 419. In someembodiments, normalization is a row wise operation. Here, the methoduses a so-called area normalization. After such normalization method,the sum of data along each row is unity. Vector length normalization isalso used, where the sum of data squared of each row equals unity.

As shown by step 421, the method may next perform certain preprocessingtechniques. Preprocessing can be employed to eliminate the effect on thedata of inclusion of the mean value in data analysis, or of the use ofparticular units of measurement, or of large differences in the scale ofthe different data types received. Examples of such preprocessingtechniques include mean centering and auto scaling. Preprocessingtechniques utilized for other purposes include for example, smoothing,outlier rejection, drift monitoring, and others. Some of thesetechniques will be described later. Once preprocessing has beencompleted, the method performs a detailed processing technique.

Next, the method performs a main process for classifying each of thesubstances according to each of their characteristics, step 423. Here,the present method performs a pattern recognition process, such as theone illustrated by the simplified diagram in FIG. 4B. This diagram ismerely an example, which should not limit the scope of the claimsherein.

As shown, method 430 begins with start, step 428. The method queries alibrary, including a plurality of pattern recognition algorithms (e.g.,Table I below), and loads (step 431) one or more of the algorithms inmemory to be used. The method selects the one algorithm, step 432, andruns the data through the algorithm, step 433. In a specific embodiment,the pattern recognition process uses more than one algorithms, which areknown, are presently being developed, or will be developed in thefuture. The process is used to find weighting factors based upondescriptors for each of the characteristics to ultimately determine anidentifiable pattern to uniquely identify each of the substances. Thepresent method runs the data, which have been preprocessed, through eachof the algorithms. Representative algorithms are set forth in Table I.TABLE I PCA Principal Components Analysis HCA Hierarchical ClusterAnalysis KNN CV K Nearest Neighbor Cross Validation KNN Prd K NearestNeighbor Prediction SIMCA CV SIMCA Cross Validation SIMCA Prd SIMCAPrediction Canon CV Canonical Discriminant Analysis and Cross ValidationCanon Prd Canonical Discriminant Prediction Fisher CV Fisher LinearDiscriminant Analysis and Cross Validation Fisher Prd Fisher LinearDiscriminant Prediction

PCA and HCA, are unsupervised learning methods. They can be used forinvestigating training data and finding the answers of: TABLE II I. Howmany principal components will cover the most of variances? II. How manyprincipal components to choose? III. How do the loading plots look? IV.How do the score plots look? V. How are the scores separated among theclasses? VI. How are the clusters grouped in their classes? VII. Howmuch are the distances among the clusters?The other four algorithms, KNN CV, SIMCA CV, Canon CV, and Fisher CV,are supervised learning methods used when the goal is to constructmodels to be used to classify future samples. These algorithms will docross validation, find the optimum number of parameters, and buildmodels.

Once the data has been run through the first algorithm, for example, themethod repeats through a branch (step 435) to step 432 to anotherprocess. This process is repeated until one or more of the algorithmshave been used to analyze the data. The process is repeated to try tofind a desirable algorithm that provides good results with a specificpreprocessing technique used to prepare the data. If all of thedesirable algorithms have been used, the method stores (or haspreviously stored) (step 437) each of the results of the processes onthe data in memory.

In a specific embodiment, the present invention provides across-validation technique. Here, an auto (or automatic)cross-validation algorithm has been implemented. The present techniqueuses cross-validation, which is an operation process used to validatemodels built with chemometrics algorithms based on training data set.During the process, the training data set is divided into calibrationand validation subsets. A model is built with the calibration subset andis used to predict the validation subset. The training data set can bedivided into calibration and validation subsets called “leave-one-out”,i.e., take one sample out from each class to build a validation subsetand use the rest samples to build a calibration subset. This process canbe repeated using different subset until every sample in the trainingset has been included in one validation subset. The predicted resultsare stored in an array. Then, the correct prediction percentages (CPP)are calculated, and are used to validate the performance of the model.One of ordinary skill in the art would recognize other techniques fordetermining calibration and validation sets when performing eitherinternal cross-validation or external cross-validation.

According to the present method, a cross-validation with one trainingdata set can be applied to generally all the models built with differentalgorithms, such as K-Nearest Neighbor (KNN), SIMCA, CanonicalDiscriminant Analysis, and Fisher Linear Discriminant Analysis,respectively. The results of correct prediction percentages (CPP) showthe performance differences with the same training data set but withdifferent algorithms. Therefore, one can pick up the best algorithmaccording to the embodiment.

During the model building, there are several parameters and options tochoose. To build the best model with one algorithm, cross-validation isalso used to find the optimum parameters and options. For example, inthe process of building a KNN model, cross-validation is used tovalidate the models built with different number of K, different scalingoptions, e.g., mean-centering or auto-scaling, and other options, e.g.,with PCA or without PCA, to find out the optimum combination of K andother options. In an alternative embodiment, auto-cross-validation isimplemented using a single push-button for ease in use. It automaticallyruns the processes mentioned above over all the (or any selected)algorithms with the training data set to determine the optimumcombination of parameters, scaling options and algorithms.

The method also performs additional steps of retrieving data, step 438,and retrieving the process or algorithm, step 439. As noted, each of theprocesses can form a descriptor for each sample in the training set.Each of these descriptors can be stored and retrieved. Here, the methodstores the raw data, the preprocessed data, the descriptors, and thealgorithm used for the method for each algorithm used according to thepresent invention. The method stops at step 441.

The above sequence of steps is merely illustrative. The steps can beperformed using computer software or hardware or a combination ofhardware and software. Any of the above steps can also be separated orbe combined, depending upon the embodiment. In some cases, the steps canalso be changed in order without limiting the scope of the inventionclaimed herein. One of ordinary skill in the art would recognize manyother variations, modifications, and alternatives.

An alternative method according to the present invention is brieflyoutlined as follows:

1. Acquire raw data in voltages;

2. Check baseline voltages;

3. Filter;

4. Calculate ΔR/R

5. Determine Training set?

6. If yes, find samples (may repeat process);

7. Determine outlier?;

8. If yes, remove bad data using, for example PCA;

9. Find important sensors using importance index (individual filteringprocess);

10. Normalize;

11. Find appropriate pattering recognition process;

12. Run each pattern recognition process;

13. Display (optional);

14. Find best fit out of each pattern recognition process;

15. Compare against confidence factor;

16. Perform other steps, as required.

The above sequence of steps is merely an example of a way to teach ortrain the present method and system according to an alternativeembodiment. The present example takes more than one different substance,where each substance has a plurality of characteristics, which arecapable of being detected by sensors or other sensing devices. Each ofthese characteristics is measured, and then fed into the present methodto create a training set. The method includes a variety of dataprocessing techniques to provide the training set. Depending upon theembodiment, some of the steps may be separated even further or combined.Details of these steps are provided below according to FIGS.

FIGS. 4D and 4E are simplified diagrams of methods according toembodiments of the present invention. These diagrams are merelyexamples, which should not limit the scope of the claims herein. One ofordinary skill in the art would recognize many other variations,modifications, and alternatives. As shown, the present method 450 beginsat step 451. Here, the method begins at a personal computer hostinterface, where the method provides a training set of samples (whichare each defined as a different class of material) to be analyzed or anunknown sample (once the training set has been processed). The trainingset can be derived from a plurality of different samples of fluids (orother substances or information). The samples can range in number frommore than one to more than five or more than ten or more than twenty insome applications. The present method processes one sample at a timethrough the method that loops back to step 451 via the branch indicatedby reference letter B, for example, from step 461, which will bedescribed in more detail below.

In a specific embodiment, the method has captured data about theplurality of samples from a data acquisition device. Here, each of thesamples form a distinct class of data according to the presentinvention. The data acquisition device can be any suitable device forcapturing either intrinsic or extrinsic information from a substance. Asmerely an example, the present method uses a data acquisition device forcapturing olfactory information. The device has a plurality of sensorsor sensing devices, which convert a scent or olfaction print into anartificial or electronic print. In a specific embodiment, such dataacquisition device is disclosed in WO 99/47905, WO 00/52444 and WO00/79243 all commonly assigned and hereby incorporated by reference forall purposes. Those of skill in the art will know of other devicesincluding other electronic noses suitable for use in the presentinvention. In a specific embodiment, the present invention capturesolfactory information from a plurality of different liquids, e.g.,isopropyl alcohol, water, toluene. The olfactory information from eachof the different liquids is characterized by a plurality of measurablecharacteristics, which are acquired by the acquisition device. Eachdifferent liquid including the plurality of measurable characteristicscan be converted into an electronic data form for use according to thepresent invention.

The method acquires the raw data from the sample in the training setoften as a voltage measurement, step 452. The voltage measurement isoften plotted as a function of time. In other embodiments, there aremany other ways to provide the raw data. For example, the raw data canbe supplied as a resistance, a current, a capacitance, an inductance, abinary characteristic, a quantized characteristic, a range value orvalues, and the like. Of course, the type of raw data used dependshighly upon the application. In some embodiments, the raw data can bemeasured multiple times, where an average is calculated. The average canbe a time weighted value, a mathematical weighted value, and others.

Next, the method checks the baseline voltages from the plurality ofsensing devices used to capture information from the sample, as shown instep 453. The method can perform any of the baseline correction methodsdescribed herein, as well as others. Additionally, the method can merelycheck to see if each of the sensing devices has an. output voltagewithin a predetermined range. If each of the sensing devices has anoutput voltage within a predetermined range, each of the sensing deviceshas a baseline voltage that is not out of range. Here, the methodcontinues to the next step. Alternatively, the method goes to step 455,which rejects the sensing device that is outside of the predeterminedvoltage range, and then continues to the next step. In some embodiments,the sensing device that is outside of the range is a faulty or badsensor, which should not be used for training or analysis purposes.

The method then determines if the measured voltage for each sensingdevice is within a predetermined range, step 454. Exposing the sensor tothe sample provides the voltage for each sensor. The exposure can bemade for a predetermined amount of time. Additionally, the exposure canbe repeated and averaged, either by time or geometrically. The voltageis compared with a range or set of ranges, which often characterize thesensor for the exposure. If the exposed sensing device is outside of itspredetermined range for the exposure, the method can reject (step 455)the sensor and proceed to the next step. The rejected sensor may befaulty or bad. Alternatively, if each of the sensing devices in, forexample, the array of sensors is within a respective predeterminedrange, then the method continues to the next step, which will bediscussed below.

The method can convert the voltage into a resistance value, step 456.Alternatively, the voltage can be converted to a capacitance, aninductance, an impedance, or other measurable characteristic. In someembodiments, the voltage is merely converted using a predeterminedrelationship for each of the sensing devices. Alternatively, there maybe a look up table, which correlates voltages with resistances. Stillfurther, there can be a mathematical relationship that correlates thevoltage with the resistance.

The method then runs the data through one or more filters, step 457. Themethod filters the data to remove any imperfections, noise, and thelike. As merely an example, data from the present data acquisitiondevice are often accompanied with glitches, high frequency noise, andthe like. Here, the signal to noise ratio is often an importantconsideration for pattern recognition especially when concentrations ofanalytes are low, exceedingly high, or not within a predefined range ofwindows according to some embodiments. In such cases, it is desirable toboost the signal to noise ratio using the present digital filteringtechnology. Examples of such filtering technology includes, but is notlimited to a Zero Phase Filter, an Adaptive Exponential Moving AverageFilter, and a Savitzky-Golay Filter.

The method runs a response on the data, step 458. Here, the method mayperform a baseline correction step. Depending upon the embodiment, therecan be many different ways to implement a baseline correction method.Here, the baseline correction method finds response peaks, calculatesΔR/R, and plots the ΔR/R verses time stamps, where the data have beencaptured. It also calculates maximum ΔR/R and maximum slope of ΔR/R forfurther processing. Baseline drift is often corrected by way of thepresent step. Once baseline drift has been corrected, the present methodundergoes a normalization process, although other processes can also beused. Here, ΔR/R can be determined using one of a plurality of methods,which are known, if any, or developed according to the presentinvention.

In the present embodiment, the method is for analyzing a training set ofsubstances, step 459 (in FIG. 4E). The method then continues to step461. Alternatively, the method skips to step 467, which will bedescribed in one or more of the copending applications. If there isanother substances in the training set to be analyzed (step 459), themethod returns to step 452 via branch B, as noted above. Here, themethod continues until each of the substances in the training set hasbeen run through the process in the present preprocessing steps. Theother samples will run through generally each of the above steps, aswell as others, in some embodiments.

Next, the method goes to step 463. This step determines if any of thedata has an outlier. In the present embodiment, the outlier is a datapoint, which does not provide any meaningful information to the method.Here, the outlier can be a data point that is outside of the noiselevel, where no conclusions can be made. The outlier is often thought ofa data point that is tossed out due to statistical deviations or becauseof a special cause of variation. That is, lowest and highest data pointscan be considered as outliers in some embodiments. If outliers arefound, step 463, the method can retake (step 465) samples, which areexposed to the sensing devices, that have the outliers. The samples thatare retaken loop back through the process via the branch indicated byreference letter B. Outliers can be removed from the data in someembodiments.

The method also can uncover important sensors using an importance index(individual filtering process). Here, the method identifies whichsensors do not provide any significant information by comparing a likesensor output with a like sensor output for each of the samples in thetraining set. If certain sensors are determined to have little influencein the results, these sensors are ignored (step 473) and then continuesto the next step, as shown. Alternatively, if generally all sensors aredetermined to have some significance, the method continues to step 467.

Next, the method performs post processing procedures (step 467), asdefined herein. The post processing procedures include, for example, anormalization step. In a specific embodiment, the normalization stepscales the data to one or other. reference value and then autoscales thedata so that each sample value is referenced against each other. If thedata is for the training step, step 468, the method continues to apattern recognition cross-validation process, step 469, the crossvalidation process is used with step 470.

As described previously, the pattern recognition process uses more thanone algorithm, for example from Table I, which are known, are presentlybeing developed, or will be developed in the future. The process is usedto find weighting factors for each of the characteristics to ultimatelydetermine an identifiable pattern to uniquely identify each of thesubstances. The present method runs the data, which have beenpreprocessed, through each of the algorithms.

Once the best fit algorithm and model has been uncovered, the methodgoes through a discrimination test, step 471. In a specific embodiment,the method compares the results, e.g., fit of data against algorithm,combination of data and other preprocessing information, againstconfidence factor (if less than a certain number, this does not work).This step provides a final screen on the data, the algorithm used, thepre-processing methods, and other factors to see if everything justmakes sense. If so, the method selects the final combination oftechniques used according to an embodiment of the present invention.

The above sequence of steps is merely illustrative. The steps can beperformed using computer software or hardware or a combination ofhardware and software. Any of the above steps can also be separated orbe combined, depending upon the embodiment. In some cases, the steps canalso be changed in order without limiting the scope of the inventionclaimed herein. One of ordinary skill in the art would recognize manyother variations, modifications, and alternatives.

EXAMPLE

To prove the principle and operation of the present invention, acomputer software program was coded and used to implement aspects of thepresent invention. This program is merely an example, which should notunduly limit the scope of the claims herein. One of ordinary skill inthe art would recognize many other variations, modifications, andalternatives. Here, a program package named “Simulation” has beenwritten in MATLAB with a graphical user interface (GUI) to simulate thedata input from chemical sensors, data preprocessing and patternrecognition so that users can try different algorithms to find the bestmethod to meet a certain application. This procedure includes manyrecommendations about details of operation to help users perform theirspecific task. It is demonstrated that “PC-Simulation” is a good andpowerful tool in R&D. Details of Simulation are provided below accordingto the headings. The present invention provides a graphical userinterface that includes a desktop workspace with a background.

1. Configuration

The “Simulation” package has been installed on a server. Here, MATLABcan be installed on client devices, where each of the client usersaccesses Simulation on the server. Once the MATLAB program has beeninstalled on the client computer, the MATLAB icon is prompted on thecomputer. To launch the MATLAB program, the user double-clicks on theMATLAB icon.

2. Commands

Having launched the MATLAB program, a MATLAB command window with a fewlines of notes is shown. There is a sign >> prompt on the left of thescreen, followed by a cursor, which means that it is ready to receive acommand. This command window is also called “workspace”. It is used toenter commands, display results and error messages.

As an example, a few useful commands in MATLAB are set forth in TableIII. TABLE III Command Description whos list all the variables in thememory cd change directory ls list all the files in the directory of“work” dir the same as ls clc erase all in the command window cleardelete all the variables in the memory clear variable only delete thevariable with that name name path list MATLAB path savefilename savevariable or variables into a .mat file with filename, variablename andstore in the “work” directory save filename ascii save to a text filethat can be loaded into excel variablename load filename load variableor variables from the file into the workspace global enable to listglobal variables in the workspace variablename delete filename deletethe file from the disk (“work” folder) A = B; assign matrix A equal to BA = B'; assign matrix A equal to B transpose A = B(3:5, :); A matrixconsists of the rows 3 to 5 of B matrix A = B(:, 2:9); A matrix consistsof the columns 2 to 9 of B matrix

The convention of data matrix set in chemometrics is that columns arevariables (sensors) and rows are samples (exposures). For example,A(2,12) is referred to as data element on the second row (the secondexposure) and the 12th column (sensor #12). A semicolon (;) at the endof command line will suppress the data display on the workspace.

Sometimes it is desirable to manipulate the data to delete rows(samples) or columns (variables) from a matrix. Here, command—delsampsis used. To delete row 12 from a matrix called data, type in

>a=delsamps(data, 12);

where a is the result matrix that comes from data without row 12.

To delete column 10 from a matrix called data, type in

-   >>b=delsamps(data′, 10)′;    where b is the result matrix that comes from data without column 10.    3. Import and Export Data

Using save filename variablename-ascii command, the data file can besaved in the MATLAB workspace to a text file (tab-delimited). Then, itcan be loaded into a spreadsheet such as Excel™ by MicrosoftCorporation. On the other hand, if a data matrix exists in Excel, thedata file can be saved to a tab-delimited text file. This can be donewith data matrix without headers. From the file menu of the MATLABworkspace, check “load workspace”, a dialogue box can then be launched.Next, any table-delimited data file can be loaded into the MATLABworkspace.

4. Method of Operation

The present method begins with a startup procedure. Here, upon thecursor (>>|) prompt on the MATLAB workspace, “simulhh” starts thePC-Simulation program. The PC-Simulation GUI 500 shown in FIG. 5A,appears on the terminal. The figure is merely an example, which shouldnot limit the scope of the claims herein. One of ordinary skill in theart would recognize many other variations, modifications, andalternatives. The GUI includes at least the following parts:

-   (a) A series of pop-up menus 501 on the left panel simulate data    loading, and data preprocessing.-   (b) A graphical display 503 at the center of the GUI shows the    images and plots of simulation.-   (c) A mini command window 505 at the lower center of the GUI prompts    the computation status and displays the results of simulation.-   (d) A list-box and a push button (Load Training) 507 on the top    right panel of GUI simulate the handheld type data loading. During    operation, samples are loaded via one class after another class 509.    The outlier, which is data outside an acceptable boundary, will be    found and removed. The class information will be attached. Using    “Save” and “load” buttons 507, training data can be saved to a file    and can be reloaded into the workspace. A pop-up menu “Pattern    Recognition” 511 on the right panel contains many algorithms for    pattern recognition. They will be discussed in detail later.-   (e) A push button “Auto CV” 513 initiates the auto cross validation    mode. The program will alternatively make a subset of the training    data and use its class information to build models, and use the    models to predict the rest of the training data. After calculating    all the combination of scaling and algorithms, the program will make    a percentage list of correct predictions. The list will be shown on    the mini command window. From there, a judgment can be made as to    which algorithm works better in the application.-   (f) An “info” button 517 displays the program information on the    mini command window.-   (g) A “Close” button 519 will stop and close the GUI program.

The GUI set forth in FIG. 5A is merely an example. It should onlyprovide the reader an understanding of the present example, withoutunduly limiting the scope of the claims herein. One of ordinary skill inthe art would recognize many other variations, modifications, andalternatives.

5. Load Data

After the data is loaded, the arrow 521 on the top-left pop-up menu of“Process Option” uncovers two choices, which pop-up, i.e., “Labnose” and“Datalogger” 523. A cursor can be moved with the mouse button down tohighlight “Labnose” and then released if chemical lab data is loadedfrom a file collected from the Keithley Instrument, which gathersresistance data. Having done this, a dialogue box browser will appear.From there, the data file can be searched through the hard disk. Once adesired file is found, the open button retrieves the data from that datafile. In a similar way, the “Datalogger” menu can be highlighted to loadthe data file collected from the Datalogger from the above capturingdevice. The mini command window will show the status of data loading.When the data loading is done, the method goes to the next processingstep to choose one of the digital filters.

6. Digital filtering

The data collected from some chemical sensors are sometimes accompaniedwith glitches and relative high frequency noise (compare to the signalfrequency). Here, the signal to noise ratio (SNR) is often important forpattern recognition especially when concentrations of analytes are low,exceedingly high, or not within a predefined range of windows. In suchcases, it is important to boost the signal to noise ratio using thepresent digital filtering technology. Multiple digital filters have beenimplemented in the Simulation, e.g., Zero Phase Filter, “zero phase”,Adaptive Exponential Moving Average Filter, “exp-mov-avg”, andSavitzky-Golay Filter, “savitzky-go”. In operation, the mouse can beused to pull down an arrow 525, which displays the filters 527. Themouse is used to highlight one of the filters to select it. In someembodiments, the program will run that digital filter immediately afterreleasing the mouse. As merely an example, some details of such filtersare set forth below.

-   (a) Zero-Phase Filter uses the information in the signal at points    before and after the current point, in essence “looking into the    future,” to eliminate phase distortion. Zero-Phase Filter does use    the z-transform of a real sequence and the z-transform of the time    reversed sequence. Preferably, the sequence being filtered should    have a length of at least three times the filter order and it tapers    to zero on both edges.-   (b) Savitzky-Golay Filter performs Savitzky-Golay smoothing using a    simple polynomial to a running local region of the sample vector. At    each increment, a polynomial of order is fitted to the number of    points (window) surrounding the increment.-   (c) Both Zero-Phase Filter and Savitzky-Golay Filter are post data    process type filters. To the contrary, Adaptive Exponential Moving    Average Filter can be used as a real-time filter. It does not need    to store the whole scan of data into the memory and then process it.    Currently the filter window is set at 11 points and it was found    that Savitzky-Golay Filter gives a good result of data smoothing    without significant distortion.

Although the above has been generally described in terms of specificfilters, those of skill in the art will be aware of other filterssuitable for use in the present invention.

7. Viewing Sensor Responses Sensor responses can be viewed using thepresent GUI 503, which illustrates ΔR/R against time in seconds. Anotherpop-up menu 531 on the left is called “Figure List”. A click on thearrow 529 displays a list from 1 to 16. Each figure has the responses offour sensors in order. For example, FIG. 1 contains responses of sensor1 to 4. Likewise, FIG. 2 contains responses of sensors 5 to 8. Move themouse arrow to highlight the figure number 3, a response plot of sensors9 to 12 with filtered and without filtered data will display on thegraphical window as shown in a diagram of FIG. 5B, for example. Likereference numerals are used in this Figure as the previous Figure foreasy referencing, without limiting the scope of the claims herein. Asshown, the diagram illustrates a filter response 541 for each of thesensors (e.g., sensor 9, sensor 10, sensor 11, sensor 12) in the array.Here, the filtered data are usually in dark colors, such as red, blue,and black. If the data set is huge and has many exposures, the plot willbe packed with response peaks and it could be hard to view the detail.By way of the present example, it is possible to view the detail of datapreprocessing. The example also allows noise levels for each of thesensors. Additionally, the example illustrates how well the filterworked. The example also allows how the sensor responds to differentanalytes within the certain exposure time. The example also allows us toexamine how the baselines drift (which is, for example, a nominal changein sensor resistance over time). In these examples, it may be desirableto load a piece of data, such as six exposures along the horizontal timeaxis or less as shown. Once the piece of data has been loaded,pre-processing can be performed. Using, for example, Wordpad byMicrosoft Corporation, it is possible to cut and paste the data tocreate a subset of the data file. Once the desired filter has been foundand used, the present method goes to a baseline correction step, asindicated below.

8. Baseline Correction

Depending upon the embodiment, there can be many different ways toimplement a baseline correction method. In the present example, threemethods for baseline correction have been implemented in the simulation.These correction methods were called “min max”, “baseline corr”, and“extrapolate”. Selection occurred by clicking 533 the popup menu of“baseline corr”, and selecting 534 one of the methods. The programguided by the flags set in the data file runs the baseline correctionmethod according to user's choice, finds the response peaks, calculatesthe ΔR/R, and plots the ΔR/R vs. time stamps. It also calculates themaximum ΔR/R and the maximum slope of ΔR/R for further processing. Asshown in FIG. 5C, the responses of all the sensors after baselinecorrection are displayed 503. In the graph, 32 traces of sensorresponses with six exposures vs. time are plotted. As noted, thebaseline drift 543 has been corrected as shown in FIG. 5C as compared tothe responses in the previous Figures, which illustrate varying baselinedisplays. Weighting, such as Zero-Weighting on insignificant signals, isalso included in the program. The threshold has been set at SNR equal tothree. Once baseline drift has been corrected, the present methodundergoes a normalization process, although other processes can also beused.

9. Normalization

Normalization is provided in the following manner. Here, the user clickson the popup menu of Normalization and three choices: “none”, “1-norm”,and “2-norm” appear, as illustrated in part in FIG. 5D. Depending uponthe embodiment, other choices may also appear. The convention of thedata matrix after the baseline correction is to set samples (exposures)along the rows and variables (sensors) along the columns. Thenormalization is a row wise operation. 1-norm is the so-called areanormalization. After 1-norm, the sum of data along each row is unity.2-norm is the so-called vector length normalization. After 2-norm, thesum of data squared of each row equals unity. From studies, it isconcluded that the ΔR/R of the sensor is proportional to theconcentration if the sensor reaches equilibrium during the exposuretime. Theoretically the normalization of such data should make a sameresponse pattern even if the sensor is exposed to a different sampleconcentration.

Here, a pseudo-color graph of 1-norm data is shown in the simplifieddiagram of FIG. SD with a color bar. The graph is plotted as sensornumber vs. sample number. The peaks are marked red and the valleys arein dark blue. The pattern in the graph is repeated as samples arecounted from 1 to 6. Up to this step, the training data set has beencreated. Click on the workspace window to bring it to the front and type“whos,” and the data set called trainpk with variable and size infodisplay on the workspace will be displayed.

10. Viewing Plots

The present method also allows for viewing the plots in a variety ofdifferent configurations, as illustrated in FIG. 5E. The popup menu ofViewing Plots will not alter the data of “trainpk”, but will allow toview different plots such as 2D spectra, 3D plots of sensors,mean-centered, and auto-scaled. One of the useful plots is the 2Dspectra plot that is shown in the FIG. 5E. Keeping these plots in thefile folder, any sensor can be followed for drifting and checkconsistency of sensor responses day after day.

11. Save Preprocessed Data

To save the preprocessed data, trainpk, the trainpk can be assigned to avariable with a new name first and then save it to a mat file or asciifile. If a file name called ttb1122 is to be saved, the command windowcan be entered as follows,

-   >>ttb1122=trainpk;-   >>save ttb1122 ttb1122;-   A ttb1122.mat file is saved in the “work” folder, or-   >>save ttb1122 ttb1122-ascii;-   A ttb1122.txt file is saved in the “work” folder.

12. Auto Preprocessing

After having gone through all the preprocessing steps, the preprocessingchoices have been selected. The GUI shows the choices on their popupwindows and keeps them intact. In certain aspects, it is desirable topreprocess many data sets, here the auto mode can be run by pressing thebutton of “Load Unknown” at the bottom left of the GUI. The programfollows the previously set preprocessing steps and runs automatically,but can also be run semi-automatically. The resulting matrix is calledsamplepk. To save samplepk, the samplepk can be assigned to a variablewith a new name first and then save it to a mat file or ascii file astrainpk, for example:

-   >>ttb1123=samplepk;-   >>save ttb1123 ttb1123.

On the top-right panel, there is a list box, “Select Class” and a fewpush buttons, “Load Training”, “Save”, and “Load”. If each data file isin one class, these buttons can be used to run auto preprocessing. Hereis the procedure:

-   (a) Use the mouse button to highlight class info in the list box on    the top-right panel, e.g., Class 1 or Class 2 or . . .-   (b) Push “Load Training” button. The GUI will automatically run    through the preprocessing steps and use PCA to screen and delete the    outliner if there is any. If the number of samples in that class is    less than ten, the program will ask for more loading of samples    belonging to that class. In that case, it is desirable to push “Load    Training” button again.-   (c) Use the mouse button to highlight another class info in the list    box.-   (d) Push “Load Training” button to load samples belonging to that    class.-   (e) Repeat the same procedure until all the samples have been    loaded.-   (f) The result is that the training set matrix, trainpk, and class    vector, class, have been created in the workspace.-   (g) Pushing “Save” button, will save trainpk and class into a mat    file with a different file name.-   (h) Later on, if the “Load” button is pushed the file can be    reloaded into the workspace.    13. Comments on Data Preprocessing

To perform pattern recognition, the choices of preprocessing for all thedata sets must often be consistent; otherwise the prediction willgenerally not work in an efficient manner. To build model from atraining set, the matrix is assigned the name of trainpk, for example.Here, the number of samples in each class is maintained the same. Aclass info vector called class is created unless the right panel is usedfor data preprocessing. For the turn-table data with six classes, assignclass=[1 2 3 4 5 6 1 2 3 4 5 6 . . . ]. For the labnose data, assignclass=[1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 . . . ]. Incertain instances, it is desirable to make trainpk from data set ttb1122and to tailor it, thus, type:

-   >>trainpk=ttb1122(13:72,:).

Then trainpk will have 60 rows from row 13 to 72 of the matrix ttb1122.To do prediction, assign the unknown data set (matrix) to the name ofsamplepk. Thereafter, type>>samplepk=ttb1123(13:18,:). Then samplepkwill consist of six rows of the matrix ttb1123.

The data preparation has been described in this section. As long astrainpk and the class vector are compatible, the program is then readyto run the pattern recognition programs.

14. Pattern Recognition

The popup menu “Pattern Recogn” 511 at the middle of right panelinitiates the pattern recognition algorithms. Click on the arrow 511 tosee a pull-down menu with all the abbreviations as described in Table Iabove. As discussed above, the top two menus, PCA and HCA, areunsupervised learning methods. They are used for investigating trainingdata. The other four algorithms, KNN CV, SIMCA CV, Canon CV, and FisherCV, are supervised learning methods used when the goal is to constructmodels to be used to classify future samples. These algorithms will docross validation, find the optimum number of parameters, and buildmodels

15. Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised method thatreduces the number of required variables to analyze similarities anddifferences amongst a set of data. The method produces a scores plot forthis analysis. The number of principal components (PC's) isautomatically determined. Each axis of the graph is assigned a PCnumber, and the percent variance captured with the particular PC isshown along the axis.

PCA of data may be performed utilizing a number of software programs.One such program is the PLS_Toolbox available from Eigenvector Research,Inc. of Manson, Wash. To perform PCA using this tool, “PCA” ishighlighted in the popup menu of “Pattern Recogn” opens a PCA GUI. Fromthe top menu bar of that GUI, click on PCA_File, and highlight LoadData. The file trainpk can be selected to load into the PCA program.When it is done, the window looks similar to output 550 in FIG. 5F. Onthe top-left corner 557, it shows that trainpk has been loaded with size60 rows×32 columns. The push button calc 558 has been clicked and theprogram will run PCA, calculates Eigen values and Eigen vectors, andlists all the percent variance captured by PCA model as shown. From thetable 559, it is desirable to find that four principal componentsalready have captured 96.05% of variance. Using more PCs may not improvethe PCA model much but capture more noise. For example, in certaininstances, it desirable to choose four PCs. Thus, click on the line of 4PCs 561. That line of data will be highlighted, as shown. Next, click onthe button apply 563, and the model with four PCs is calculated. Fiveplot push buttons 551, eigen 552, scores 553, loads 554, biplot 555,data 556 are highlighted.

In other aspects, push the button “scores,” and choose to plot PC1 vs.PC2, and see a Scores Plot as displayed in a spatial configuration ofFIG. 5G. Here, the FIG. depicts that the training data has six classes,and are grouped well except class 1 and class 6 with a little overlap.In some embodiments, make a 3D plot by choosing three PCs to plot. Toprint a hard copy, the “spawn” button is selected to create a separateplot window, which can be printed.

FIGS. 5K and 5L show alternative approaches for performing PCA. FIG. 5Kshows a three-dimensional Scores Plot 590. FIG. 5L shows a graphic userinterface for this approach, wherein clicking the arrow of “PatternRecogn” and highlighting “PCA” causes a pop-up window to appear. Thispop-up window allows the user to select the method of pre-processing(i.e. no pre-processing, mean-center, or auto-scale). As shown in FIG.5L, the Scores Plot then appears. In the menu option, the user mayselect “zoom in”, “zoom out”, or “rotate” to change the view of thescores plot in the graphical display.

16. Mean Centering and Autoscaling

The default setting in the PCA GUI is autoscaling. From the menu bar ofthe PLS_Toolbox application, by selecting PCA_Scale, the method canchange among no scaling, mean center, and autoscaling. PCA is scaledependent, and numerically larger variables appear more important inPCA. In certain instances, the data that varies around the mean is ofinterest. Mean centering is done by subtracting the mean off thevariables in each column, thus forming a matrix where each column has amean of zero. Autoscaling is done by dividing each variable (alreadymean centered) in each column by its standard deviation. The variablesof each column of the resulting matrix have unit variance. The button,auto CV, will run the algorithms with mean centering and autoscaling todo cross validation and find out what combination gives the bestprediction.

17. Hierarchical Cluster Analysis (HCA)

Hierarchical cluster analysis (HCA) is an unsupervised technique thatexamines the inter-point distances between all of the samples, andpresents that information in the form of a two-dimensional plot called adendrogram as shown in FIG. 5H. To generate the dendrogram, HCA formsclusters of samples based on their nearness in row space. Click thearrow of “Pattern Recogn” and highlight “HCA”, the GUI enables differentapproaches to measure distances between clusters, e.g., mean centeringvs. autoscaling; single vs. centroid linking; run PCA vs. not run PCA;Euclidean vs. Mahalanobis distance.

After having run the HCA, the mini window and the workspace lists allthe links from the shortest distance to the longest distance. Theclustering information is also shown in the dendrogram. The ordinatepresents sample numbers and their class info; while the abscissas givesdistances between sample points and between clusters. The six classesare well observed in that graph. The distances between sample points andbetween clusters can be found from the abscissas.

18. Auto Cross Validation

The method also performs a cross validation technique. Here, click thebutton, “Auto CV,” and the Simulation GUI will run cross validationusing all the supervised techniques with the combination of either meancentering or autoscaling. The Auto CV finds the optimum combination ofscaling and algorithm, the optimum number of principal components, andthe optimum K in KNN CV. The results of top five predictions from AutoCV are presented in the mini window as shown in FIG. 5I. It may bedesirable to use the information to construct other models to get betterclassification.

In the Simulation program, an auto cross-validation algorithm has beenimplemented. Cross-Validation is an operation process used to validatemodels built with chemometrics algorithms based on training data set.During the process, the training data set is divided into calibrationand validation subsets. A model is built with the calibration subset andis used to predict the validation subset. One approach of dividing thetraining data set into calibration and validation subsets is called“leave-one-out”, i.e., take one sample out from each class to build avalidation subset and use the rest samples to build a calibrationsubset. This process is repeated using different subsets until everysample in the training set has been included in one validation subset.The predicted results are stored in an array. Then, the correctprediction percentages (CPP) are calculated, and are used to validatethe performance of the model.

In the Simulation program, the cross-validation with one training dataset can be applied to all the models built with different algorithms,such as K-Nearest Neighbor (KNN), SIMCA, Canonical DiscriminantAnalysis, and Fisher Linear Discriminant Analysis, respectively. Theresults of correct prediction percentages (CPP) show the performancedifferences with the same training data set but with differentalgorithms.

During the model building, there are several parameters and options tochoose. To build the best model with one algorithm, cross-validation isalso used to find the optimum parameters and options. For example, inthe process of building a KNN model, cross-validation is used tovalidate the models built with different number of K, different scalingoptions, e.g., mean-centering or auto-scaling, and other options, e.g.,with PCA or without PCA, to find out the optimum combination of K andother options.

Auto-Cross-Validation has been implemented in the Simulation GUI via onepush-button. It will automatically run the processes mentioned aboveover all the algorithms with the training data set to find out theoptimum combination of parameters, scaling options and algorithms. Usingthat information, it is possible to build a model to get betterclassification capability.

19. Construct Models

In some embodiments, the method constructs models. Here, click the popupmenu, “SIMCA CV,” and the Simulation GUI will construct a SIMCA modelbased on choice of scaling. After it is done, the graph window shows theplots of Q vs. T² of each class, and the mini window displays that 4 PCshave been chosen to construct the model and the predictions of crossvalidation are, say, 100% correct. A data structure (the model) namedsimcamod has been created in the workspace if whos is typed in theworkspace. A KNN Model, knnmod, Canonical Model, canmod, and FisherLinear Discriminant Model, fldmod, can be constructed in the same way byclicking and highlighting the popup menus, respectively. Validation canoccur by typing whos to validate how many models are there in theworkspace, as illustrated by FIG. 5J.

20. Make Predictions

The unknown samples to be predicted are named as samplepk. In certainaspects, there are two ways to make unknown samples, samplepk:

Push “Load Unknown” button, the Simulation GUI will load unknown samplesfrom a raw data file, preprocess it automatically and create samplepk.

Tailor the preprocessed data as mentioned before and assign it tosamplepk, such as>>samplepk=ttb1123(13:18,:).

To make a prediction, click the popup menu and highlight correspondingmenu to initiate prediction run. KNN Prd will run KNN model on theunknown samples, and present the prediction results in the mini commandwindow. The prediction results will be like:

Unknown 1 belongs to class 1; Goodness Value=−0.8976

Unknown 2 is close to class 2; Goodness Value=4.8990

If the Goodness value is less than 4, it will be considered belonging tothat class.

Click on the buttons of SIMCA Prd, Canon Prd, and FisherPrdrespectively, and the Simulation GUI will do the same. The predictionresults with the information of probabilities or confidence levels willbe presented in the mini command window.

SIMCA Prd gives predictions with rms normalized distance levels. If thelevel is greater than 1.414, the unknown is not considered belonging tothat class, but it is close to that class.

Canon Prd provides predictions with probability level values. If theprobability level is less than 0.99, the unknown sample is consideredbelonging to that class; otherwise, it will be pointed as belonging tothe closest class.

While the invention has been described with reference to certainillustrated embodiments this description is not intended to be construedin a limiting sense. For example, the computer platform used toimplement the above embodiments include 586 class based computers, PowerPC based computers, Digital ALPHA based computers, SunMicrosystems SPARCcomputers, etc.; computer operating systems may include WINDOWS NT, DOS,MacOs, UNIX, VMS, etc.; programming languages may include C, C⁺⁺,Pascal, an object-oriented language, HTML, XML, and the like. Variousmodifications of the illustrated embodiments as well as otherembodiments of the invention will become apparent to those personsskilled in the art upon reference to this description.

In addition, a number of the above processes can be separated orcombined into hardware, software, or both and the various embodimentsdescribed should not be limiting. As will be appreciated by one of skillin the art, the present invention can be embodied as a method, dataprocessing system, or computer program product. Accordingly, the presentinvention can take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment combining software andhardware aspects. Furthermore, the present invention can take the formof a computer program product on a computer-usable storage medium havingcomputer-usable program code embodied in the medium. Any suitablecomputer readable medium can be utilized including hard disks, CD-ROMs,optical storage devices, or magnetic storage devices. It will beunderstood, therefore that the invention is defined not by the abovedescription, but by the appended claims. All publications, patents, andpatent applications cited herein are hereby incorporated by referencefor all purposes in their entirety.

1-50. (canceled)
 51. A computer system comprising: a) a process manager;b) an input module coupled to the process manager for providing to acomputing device a first data from a first sensing device and a seconddata from a second sensing device, wherein the first and second sensingdevices are connected to the input module over a computer network; andc) a pattern recognition module coupled to the process manager forprocessing the first and second data using a pattern recognitionalgorithm to classify or identify a substance.
 52. The system of claim51, wherein the first data and second data each comprise characteristicsselected from olfactory information, temperature, color, and humidity.53. The system of claim 51, wherein the pattern recognition is a FisherLinear Discriminant Analysis.
 54. The system of claim 51, wherein thefirst data and the second data can be selected from a transient streamof data or from a static source of data.
 55. The system of claim 51,wherein the first data and the second data are each captured from anarray of olfactory sensors.
 56. The system of claim 55, wherein theolfactory sensors are comprised of a polymer component.
 57. The systemof claim 51, wherein the first data and the second data are providedthrough a worldwide network of computers, the worldwide network ofcomputers comprising the Internet.
 58. A system comprising memoryincluding a computer code product, the memory comprising: a) a codedirected to acquiring over a computer network a first data from a firstsensing device, wherein the first sensing device comprises at least onechemical, biological, or radiation sensor; b) a code directed toacquiring over a computer network a second data from a second sensingdevice, wherein the second sensing device comprises at least onechemical, biological, or radiation sensor, and c) a code directed toapplying a pattern recognition algorithm to the first data and seconddata to classify or identify a substance.
 59. A method comprising: a)acquiring over a computer network a first data from a first sensingdevice, wherein the first sensing device comprises at least onechemical, biological, or radiation sensor; b) acquiring over a computernetwork a second data from a second sensing device, wherein the secondsensing device comprises at least one chemical, biological, or radiationsensor; c) storing the first data and second data in memory; and d)applying a pattern recognition algorithm to the first data and seconddata to classify or identify a substance.